In-Place Longest Common Extensions
نویسنده
چکیده
Longest Common Extension (LCE) queries are a fundamental sub-routine in many stringprocessing algorithms, including (but not limited to) suffix-sorting, string matching, and identification of palindrome factors and repeats. A LCE query takes as input two positions i, j in a text T ∈ Σ and returns the length l of the longest common prefix between T ’s i-th and j-th suffixes. It is clear that we can store T in n⌈log2 |Σ|⌉ bits and answer LCE queries in O(l) time by direct comparison of the two suffixes. This solution has also the advantage of supporting optimal-time text extraction. In this paper, we prove the following (somehow surprising) result: in the RAM model, n⌈log2 |Σ|⌉ bits of space are sufficient to support deterministic O(log 2 l)-time LCE queries and optimaltime text extraction. LCE query times can be improved to O(log l) by adding only O(log n) words to the space usage. In other words, we can replace the (plain) text with a data structure of the same size supporting exponentially faster LCE queries without penalizing text extraction times. Importantly, our structure can be built in O(n log n) expected time and linear space, and is therefore also practical. By applying our techniques to the suffix sorting problem, we obtain (i) a novel in-place suffix array construction algorithm and (ii) the first efficient in-place solution for the sparse suffix sorting problem.
منابع مشابه
Modifications of the Landau-Vishkin Algorithm Computing Longest Common Extensions via Suffix Arrays and Efficient RMQ computations
Approximate string matching is an important problem in Computer Science. The standard solution for this problem is an O(mn) running time and space dynamic programming algorithm for two strings of length m and n. Landau and Vishkin developed an algorithm which uses suffix trees for accelerating the computation along the dynamic programming table and reaching space and running time in O(nk), wher...
متن کاملA Modification of the Landau-Vishkin Algorithm Computing Longest Common Extensions via Suffix Arrays
Approximate string matching is an essential problem in many areas related to Computer Science including biological sequence processing. The standard solution of this problem is an O(mn) running time and space dynamic programming algorithm for two strings of length m and n. Landau and Vishkin developed an algorithm which uses suffix trees for accelerating the computation along the dynamic progra...
متن کاملLongest Common Extensions in Sublinear Space
The longest common extension problem (LCE problem) is to construct a data structure for an input string T of length n that supports LCE(i, j) queries. Such a query returns the length of the longest common prefix of the suffixes starting at positions i and j in T . This classic problem has a well-known solution that uses O(n) space and O(1) query time. In this paper we show that for any trade-of...
متن کاملUsing longest common subsequence and character models to predict word forms
This paper presents an algorithm for automatic word forms inflection. We use the method of longest common subsequence to extract abstract paradigms from given pairs of basic and inflected word forms, as well as suffix and prefix features to predict this paradigm automatically. We elaborate this algorithm using combination of affix feature-based and character ngram models, which substantially en...
متن کاملExtensions of Some Fixed Point Theorems for Weak-Contraction Mappings in Partially Ordered Modular Metric Spaces
The purpose of this paper is to establish fixed point results for a single mapping in a partially ordered modular metric space, and to prove a common fixed point theorem for two self-maps satisfying some weak contractive inequalities.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1608.05100 شماره
صفحات -
تاریخ انتشار 2016